I chose the cardox data from the astsa package. This data is from 1958 to 2018 of Monthly Carbon Dioxide Levels at Mauna Loa.
data(
list = "cardox",
package = "astsa"
)
First I created a plot of the time series with a horizontal line for the sample mean of the data and a trend line from regression.
sample_mean <- mean(cardox)
lm_df <- lm(
formula = cardox~time(cardox)
)
summary(
object = lm_df
)
##
## Call:
## lm(formula = cardox ~ time(cardox))
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.8154 -2.9392 -0.5391 2.6590 11.1524
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.732e+03 1.734e+01 -157.6 <2e-16 ***
## time(cardox) 1.552e+00 8.720e-03 178.0 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.129 on 727 degrees of freedom
## Multiple R-squared: 0.9776, Adjusted R-squared: 0.9775
## F-statistic: 3.168e+04 on 1 and 727 DF, p-value: < 2.2e-16
astsa::tsplot(
x = cardox,
ylab = "Carbon Dioxide Levels at Mauna Loa",
col = 4,
lwd = 1,
main = "Cardox Data"
)
abline(
h = sample_mean,
col = 2
)
abline(
reg = lm_df,
col = 1
)
legend("topleft", legend=c("Original Data", "Sample Mean", "Simple Linear Regression"), col=c("blue", "red", "black"), lwd=1:1:1, bty = "n")
Using the decompose function in R we can see the original time series, trend, seasonality, and randomness in the data.
plot(decompose(cardox))
As we can see from the plots that there is a clear increasing trend and there also appears to be seasonality.
Since the data has an increasing trend and has seasonality I will do 1 degree of non-seasonal differencing and 1 degree of seasonal differencing and see how the data looks after.
diff_cardox <- diff(cardox, differences = 1)
astsa::acf2(
series = diff_cardox
)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## ACF 0.7 0.24 -0.21 -0.46 -0.51 -0.5 -0.51 -0.45 -0.19 0.24 0.71 0.92
## PACF 0.7 -0.48 -0.31 -0.06 -0.12 -0.4 -0.54 -0.51 -0.35 -0.22 0.18 0.39
## [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## ACF 0.71 0.24 -0.20 -0.45 -0.51 -0.49 -0.50 -0.44 -0.20 0.25 0.70 0.91
## PACF 0.17 -0.01 0.02 0.04 0.03 0.11 0.03 -0.02 -0.07 -0.05 0.02 0.12
## [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36]
## ACF 0.70 0.23 -0.20 -0.45 -0.50 -0.49 -0.49 -0.43 -0.19 0.25 0.69 0.89
## PACF 0.05 -0.11 0.03 0.00 0.03 0.02 0.02 0.08 0.03 0.00 0.05 0.05
## [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48]
## ACF 0.68 0.22 -0.20 -0.44 -0.49 -0.48 -0.48 -0.42 -0.18 0.24 0.69 0.87
## PACF 0.02 -0.06 0.01 0.04 0.05 -0.01 0.02 0.04 0.00 -0.05 0.06 0.03
diff2_seasonal_cardox = diff(
x = diff_cardox,
lag = 12
)
astsa::acf2(
series = diff2_seasonal_cardox
)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## ACF -0.32 0.01 -0.08 -0.02 0.06 -0.02 0.00 -0.01 0.11 -0.07 0.16 -0.46
## PACF -0.32 -0.11 -0.12 -0.10 0.01 -0.01 -0.01 -0.01 0.12 0.01 0.18 -0.40
## [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## ACF 0.12 0.06 -0.02 0.04 -0.08 0.05 -0.04 0.00 -0.06 0.03 -0.01 -0.02
## PACF -0.17 -0.02 -0.07 -0.04 -0.04 0.02 -0.01 -0.03 0.01 -0.01 0.07 -0.27
## [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36]
## ACF 0.02 -0.07 0.08 -0.03 0.04 -0.02 -0.03 0.03 0.00 0.05 -0.02 0.02
## PACF -0.14 -0.09 -0.02 -0.02 -0.01 0.04 -0.05 -0.03 -0.01 0.04 0.06 -0.18
## [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48]
## ACF 0.03 0.00 -0.05 -0.01 0.02 -0.02 0.01 0.02 -0.02 -0.02 0.05 -0.03
## PACF -0.02 -0.05 -0.04 -0.06 -0.02 -0.02 -0.12 0.00 -0.05 0.00 0.09 -0.12
astsa::tsplot(
x = cbind(
cardox, diff_cardox, diff2_seasonal_cardox
),
main = "Time series plots of differenced Cardox data"
)
We can see after 1 degree of non-seasonal differencing and 1 degree of seasonal differencing that the time series plot looks stationary. We can also see that the ACF cuts off at 1 while the PACF seems to trail off with some indication that there is underlying seasonal trend.
A reasonable seasonal trend for my data is annual. Therefore, my seasonal period is year and the subinterval is month.
First we can visualize the data with subinterval as the horizontal axis, and the observed value as the vertical axis. The first plot is the original data, then non-seasonal differencing, then lastly seasonal differencing.
monthplot(
x = cardox
)
monthplot(
x = diff_cardox
)
monthplot(
x = diff2_seasonal_cardox
)
Then we can visualize the data with the seasonal period as the horizontal axis, and observed value as the vertical axis. Here we have the orignal data.
cardox_monthly_list <- split(cardox, f = cycle(cardox))
# Set up the layout for the subplots
par(mfrow = c(3, 4))
# Loop through each month and create a separate plot
for (i in 1:12) {
# Create a time series object for the current month
monthly_ts <- cardox_monthly_list[[i]]
# Plot the monthly time series with the month names as the x-axis labels
astsa::tsplot(
x = monthly_ts,
xlab = "Year",
ylab = "CO2 Levels",
main = month.abb[i]
)
}
Here the plots are updated after performing non-seasonal differencing and seasonal differencing on the data.
diff2_cardox_monthly_list <- split(diff2_seasonal_cardox, f = cycle(diff2_seasonal_cardox))
# Set up the layout for the subplots
par(mfrow = c(3, 4))
# Loop through each month and create a separate plot
for (i in 1:12) {
# Create a time series object for the current month
monthly_ts <- diff2_cardox_monthly_list[[i]]
# Plot the monthly time series with the month names as the x-axis labels
astsa::tsplot(
x = monthly_ts,
xlab = "Year",
ylab = "Differenced CO2 Levels",
main = month.abb[i]
)
}
Looking at the ACF and PACF I fit six seasonal autoregressive integrated models.
ARIMA(1,1,0) x seasonal ARIMA(1,1,0)[12]
fit1 <- astsa::sarima(cardox, p = 1, d = 1, q = 0,P = 1, D = 1, Q = 0, S = 12)
## initial value -0.833682
## iter 2 value -1.016921
## iter 3 value -1.016976
## iter 4 value -1.016978
## iter 4 value -1.016978
## iter 4 value -1.016978
## final value -1.016978
## converged
## initial value -1.011038
## iter 2 value -1.011050
## iter 3 value -1.011051
## iter 3 value -1.011051
## iter 3 value -1.011051
## final value -1.011051
## converged
fit1
## $fit
##
## Call:
## arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S),
## include.mean = !no.constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc,
## REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 sar1
## -0.3291 -0.4811
## s.e. 0.0353 0.0331
##
## sigma^2 estimated as 0.1318: log likelihood = -292.05, aic = 590.1
##
## $degrees_of_freedom
## [1] 714
##
## $ttable
## Estimate SE t.value p.value
## ar1 -0.3291 0.0353 -9.3127 0
## sar1 -0.4811 0.0331 -14.5234 0
##
## $AIC
## [1] 0.8241552
##
## $AICc
## [1] 0.8241787
##
## $BIC
## [1] 0.8433186
ARIMA(1,1,0) x seasonal ARIMA(2,1,0)[12]
fit2 <- astsa::sarima(cardox, p = 1, d = 1, q = 0,P = 2, D = 1, Q = 0, S = 12)
## initial value -0.832530
## iter 2 value -1.027055
## iter 3 value -1.074395
## iter 4 value -1.079637
## iter 5 value -1.079654
## iter 6 value -1.079659
## iter 6 value -1.079659
## iter 6 value -1.079659
## final value -1.079659
## converged
## initial value -1.067746
## iter 2 value -1.067775
## iter 3 value -1.067775
## iter 4 value -1.067775
## iter 4 value -1.067775
## iter 4 value -1.067775
## final value -1.067775
## converged
fit2
## $fit
##
## Call:
## arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S),
## include.mean = !no.constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc,
## REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 sar1 sar2
## -0.3400 -0.6468 -0.3403
## s.e. 0.0352 0.0361 0.0365
##
## sigma^2 estimated as 0.1172: log likelihood = -251.43, aic = 510.87
##
## $degrees_of_freedom
## [1] 713
##
## $ttable
## Estimate SE t.value p.value
## ar1 -0.3400 0.0352 -9.6453 0
## sar1 -0.6468 0.0361 -17.9379 0
## sar2 -0.3403 0.0365 -9.3240 0
##
## $AIC
## [1] 0.7134998
##
## $AICc
## [1] 0.7135469
##
## $BIC
## [1] 0.7390511
ARIMA(3,1,0) x seasonal ARIMA(1,1,0)[12]
fit3 <- astsa::sarima(cardox, p = 3, d = 1, q = 0,P = 1, D = 1, Q = 0, S = 12)
## initial value -0.833327
## iter 2 value -1.018565
## iter 3 value -1.027807
## iter 4 value -1.030047
## iter 5 value -1.030079
## iter 6 value -1.030080
## iter 6 value -1.030080
## iter 6 value -1.030080
## final value -1.030080
## converged
## initial value -1.023892
## iter 2 value -1.023969
## iter 3 value -1.023969
## iter 3 value -1.023969
## iter 3 value -1.023969
## final value -1.023969
## converged
fit3
## $fit
##
## Call:
## arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S),
## include.mean = !no.constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc,
## REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ar2 ar3 sar1
## -0.3786 -0.1544 -0.1146 -0.4804
## s.e. 0.0372 0.0394 0.0372 0.0331
##
## sigma^2 estimated as 0.1284: log likelihood = -282.8, aic = 575.6
##
## $degrees_of_freedom
## [1] 712
##
## $ttable
## Estimate SE t.value p.value
## ar1 -0.3786 0.0372 -10.1897 0.0000
## ar2 -0.1544 0.0394 -3.9177 0.0001
## ar3 -0.1146 0.0372 -3.0800 0.0021
## sar1 -0.4804 0.0331 -14.5082 0.0000
##
## $AIC
## [1] 0.8039063
##
## $AICc
## [1] 0.8039848
##
## $BIC
## [1] 0.8358454
ARIMA(0,1,1) x seasonal ARIMA(0,1,1)[12]
fit4 <- astsa::sarima(cardox, p = 0, d = 1, q = 1,P = 0, D = 1, Q = 1, S = 12)
## initial value -0.829273
## iter 2 value -1.063862
## iter 3 value -1.100292
## iter 4 value -1.125518
## iter 5 value -1.132304
## iter 6 value -1.132783
## iter 7 value -1.134864
## iter 8 value -1.134927
## iter 9 value -1.134931
## iter 9 value -1.134931
## iter 9 value -1.134931
## final value -1.134931
## converged
## initial value -1.152690
## iter 2 value -1.156302
## iter 3 value -1.157090
## iter 4 value -1.158203
## iter 5 value -1.158305
## iter 6 value -1.158310
## iter 7 value -1.158310
## iter 7 value -1.158310
## iter 7 value -1.158310
## final value -1.158310
## converged
fit4
## $fit
##
## Call:
## arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S),
## include.mean = !no.constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc,
## REPORT = 1, reltol = tol))
##
## Coefficients:
## ma1 sma1
## -0.3875 -0.8641
## s.e. 0.0390 0.0192
##
## sigma^2 estimated as 0.09634: log likelihood = -186.61, aic = 379.22
##
## $degrees_of_freedom
## [1] 714
##
## $ttable
## Estimate SE t.value p.value
## ma1 -0.3875 0.0390 -9.9277 0
## sma1 -0.8641 0.0192 -45.1205 0
##
## $AIC
## [1] 0.5296369
##
## $AICc
## [1] 0.5296604
##
## $BIC
## [1] 0.5488003
ARIMA(1,1,1) x seasonal ARIMA(1,1,1)[12]
fit5 <- astsa::sarima(cardox, p = 1, d = 1, q = 1,P = 1, D = 1, Q = 1, S = 12)
## initial value -0.833682
## iter 2 value -1.049183
## iter 3 value -1.106150
## iter 4 value -1.144395
## iter 5 value -1.153421
## iter 6 value -1.154618
## iter 7 value -1.155168
## iter 8 value -1.155888
## iter 9 value -1.157266
## iter 10 value -1.157866
## iter 11 value -1.157971
## iter 12 value -1.157987
## iter 13 value -1.157987
## iter 14 value -1.157988
## iter 15 value -1.157988
## iter 16 value -1.157989
## iter 17 value -1.157989
## iter 18 value -1.157989
## iter 18 value -1.157989
## iter 18 value -1.157989
## final value -1.157989
## converged
## initial value -1.157964
## iter 2 value -1.159201
## iter 3 value -1.160147
## iter 4 value -1.160440
## iter 5 value -1.160506
## iter 6 value -1.160730
## iter 7 value -1.160796
## iter 8 value -1.160805
## iter 9 value -1.160806
## iter 9 value -1.160806
## iter 9 value -1.160806
## final value -1.160806
## converged
fit5
## $fit
##
## Call:
## arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S),
## include.mean = !no.constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc,
## REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ma1 sar1 sma1
## 0.1941 -0.5578 -0.0008 -0.8647
## s.e. 0.0953 0.0813 0.0427 0.0211
##
## sigma^2 estimated as 0.09585: log likelihood = -184.82, aic = 379.65
##
## $degrees_of_freedom
## [1] 712
##
## $ttable
## Estimate SE t.value p.value
## ar1 0.1941 0.0953 2.0364 0.0421
## ma1 -0.5578 0.0813 -6.8645 0.0000
## sar1 -0.0008 0.0427 -0.0185 0.9852
## sma1 -0.8647 0.0211 -40.9625 0.0000
##
## $AIC
## [1] 0.5302324
##
## $AICc
## [1] 0.530311
##
## $BIC
## [1] 0.5621715
ARIMA(1,1,1) x seasonal ARIMA(2,1,1)[12]
fit6 <- astsa::sarima(cardox, p = 1, d = 1, q = 1,P = 2, D = 1, Q = 1, S = 12)
## initial value -0.832530
## iter 2 value -1.063340
## iter 3 value -1.132522
## iter 4 value -1.144263
## iter 5 value -1.158735
## iter 6 value -1.161729
## iter 7 value -1.165939
## iter 8 value -1.168425
## iter 9 value -1.169703
## iter 10 value -1.170780
## iter 11 value -1.171582
## iter 12 value -1.172552
## iter 13 value -1.172683
## iter 14 value -1.172791
## iter 15 value -1.172800
## iter 16 value -1.172899
## iter 17 value -1.172937
## iter 18 value -1.172952
## iter 19 value -1.172953
## iter 20 value -1.172953
## iter 20 value -1.172953
## iter 20 value -1.172953
## final value -1.172953
## converged
## initial value -1.160319
## iter 2 value -1.160958
## iter 3 value -1.161088
## iter 4 value -1.161134
## iter 5 value -1.161265
## iter 6 value -1.161279
## iter 7 value -1.161281
## iter 8 value -1.161285
## iter 9 value -1.161288
## iter 10 value -1.161289
## iter 11 value -1.161289
## iter 11 value -1.161289
## iter 11 value -1.161289
## final value -1.161289
## converged
fit6
## $fit
##
## Call:
## arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S),
## include.mean = !no.constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc,
## REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ma1 sar1 sar2 sma1
## 0.1966 -0.5616 -0.0098 -0.0357 -0.8558
## s.e. 0.0948 0.0807 0.0444 0.0429 0.0249
##
## sigma^2 estimated as 0.09574: log likelihood = -184.48, aic = 380.95
##
## $degrees_of_freedom
## [1] 711
##
## $ttable
## Estimate SE t.value p.value
## ar1 0.1966 0.0948 2.0746 0.0384
## ma1 -0.5616 0.0807 -6.9606 0.0000
## sar1 -0.0098 0.0444 -0.2205 0.8256
## sar2 -0.0357 0.0429 -0.8329 0.4052
## sma1 -0.8558 0.0249 -34.4245 0.0000
##
## $AIC
## [1] 0.5320593
##
## $AICc
## [1] 0.5321773
##
## $BIC
## [1] 0.5703862
fit6 <- astsa::sarima(cardox, p = 0, d = 1, q = 3,P = 0, D = 1, Q = 1, S = 12)
## initial value -0.829273
## iter 2 value -1.067982
## iter 3 value -1.105678
## iter 4 value -1.133375
## iter 5 value -1.139032
## iter 6 value -1.139763
## iter 7 value -1.140711
## iter 8 value -1.140792
## iter 9 value -1.140798
## iter 10 value -1.140799
## iter 10 value -1.140799
## final value -1.140799
## converged
## initial value -1.157489
## iter 2 value -1.161065
## iter 3 value -1.161857
## iter 4 value -1.162557
## iter 5 value -1.162659
## iter 6 value -1.162701
## iter 7 value -1.162702
## iter 8 value -1.162702
## iter 8 value -1.162702
## iter 8 value -1.162702
## final value -1.162702
## converged
fit6
## $fit
##
## Call:
## arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S),
## include.mean = !no.constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc,
## REPORT = 1, reltol = tol))
##
## Coefficients:
## ma1 ma2 ma3 sma1
## -0.3691 -0.0231 -0.0752 -0.8661
## s.e. 0.0379 0.0413 0.0387 0.0189
##
## sigma^2 estimated as 0.09547: log likelihood = -183.47, aic = 376.93
##
## $degrees_of_freedom
## [1] 712
##
## $ttable
## Estimate SE t.value p.value
## ma1 -0.3691 0.0379 -9.7284 0.0000
## ma2 -0.0231 0.0413 -0.5594 0.5761
## ma3 -0.0752 0.0387 -1.9451 0.0522
## sma1 -0.8661 0.0189 -45.9099 0.0000
##
## $AIC
## [1] 0.5264396
##
## $AICc
## [1] 0.5265182
##
## $BIC
## [1] 0.5583787
The first 3 fits have spikes in the ACF of Residuals and also have bad p-values for Ljung-Box statistic. Fits 4, 5 and 6 all are good models. However, fit 6 is the best model out of the rest with the lowest AIC and BIC values. We can also see from fit 6 model that all the plots have good results. The standardized residual plot shows no pattern and the ACF of residuals doesnt have any significant values. We can also see from the Q-Q plot that its approximately normal. Lastly, the p-values for Ljung Box statistic show no sign of autocorrelation.
Using the model selected above we can predict the future values for the next year.
astsa::sarima.for(
xdata = cardox,
n.ahead = 12,
p = 0,d = 1,q = 3,
P = 0,D = 1,Q = 1,
S = 12
)
## $pred
## Jan Feb Mar Apr May Jun Jul Aug
## 2018
## 2019 410.4176 411.0685 412.0005 413.4635 414.1212 413.3836 411.6374 409.5831
## Sep Oct Nov Dec
## 2018 409.2518
## 2019 408.1255 408.4259 410.0283
##
## $se
## Jan Feb Mar Apr May Jun Jul
## 2018
## 2019 0.3653471 0.4107911 0.4425241 0.4721290 0.4999840 0.5263670 0.5514893
## Aug Sep Oct Nov Dec
## 2018 0.3089864
## 2019 0.5755161 0.5985791 0.6207860 0.6422254
To assess the quality of the model, we can visually inspect the fit of the model to the original data. The plot above shows that the model fits the data reasonably well, capturing both the overall trend and the seasonal fluctuations.We can also see that the standard errors of the prediction are low and relatively consistent to one another. This is a good indication that the model is a good quality model.